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Abstract —The operating status of power systems is influenced 
by growing varieties of factors, resulting from the developing sizes 
and complexity of power systems; in this situation, the model- 
based methods need be revisited. A data-driven method, as the 
novel alternative, on the other hand, is proposed in this paper: it 
reveals the correlations between the factors and the system status 
through statistical properties of data. An augmented matrix, as 
the data source, is the key trick for this method; it is formulated 
by two parts: 1) status data as the basic part, and 2) factor data 
as the augmented part. The random matrix theory (RMT) is 
applied as the mathematical framework. The linear eigenvalue 
statistics (LESs), such as the mean spectral radius (MSR), 
are defined to study data correlations through large random 
matrices. Compared with model-based methods, the proposed 
method is inspired by a pure statistical approach, without a prior 
knowledge of operation and interaction mechanism models for 
power systems and factors. In general, this method is direct in 
analysis, robust against bad data, universal to various factors, 
and applicable for real-time analysis. A case study, based on the 
standard IEEE 118-bus system, validates the proposed method. 

Index Terms —correlation analysis, power systems, big data 
analytics, augmented matrix, random matrix theory, linear eigen¬ 
value statistics 


I. Introduction 

T he operating status of power systems is affected by 
numerous factors. It is fundamental to understand the 
statistical correlations between those factors and power sys¬ 
tems. These correlations reveal the causes to disturbances 
and faults 111]. Nowadays, power systems, large in sizes and 
complex in structure, are penetrated by more and more various 
elements, such as distributed generations, flexible loads, and 
electric vehicles. All these elements lead to strong interaction, 
multiple coupling, and high randomness in power systems. On 
this occasion, model-based methods, establishing mechanism 
models with assumptions and simplifications as essential pre¬ 
conditions, are questionable. 

Data have become a strategic resource in power systems. 
The 4Vs data |2], with great potential value, are hard to 
handle by conventional model-based methods. This situation 
leads to an emerging paradigm—big data analytics—for power 
systems. Big data analytics aims to work out statistical features 
measured by the eigenvalue statistics, without establishing 
mechanism models. The linear eigenvalue statistics (LESs) are 
of central interest in statistics yt). When the matrix size is 
sufficiently large, LESs tend to deterministic limiting values 
(expected values). Various forms of the central limit theorems 
are also established in recent statistical papers. The statistical 
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error between the limiting expected value and the eigenvalue 
statistics will be decreasing as a function of data size. For 
more details on the convergence rate, we refer to 0). 

Many studies on bm data analytics have been achieved 
in power systems O-tzl]. In our previous work, a universal 
architecture with big data analytics is proposed iH], and 
applied successfully in many fields, such as the anomaly 
detection 0, and the 3D power map for situational awareness 
liol] . This paper is built upon our previous paper 0 in that 
the augmented matrix contains a remarkable rich statistical 
information between two (or more) block matrices. The trick is 
the observation that a matrix of block random submatrices are 
also a (large) random matrix. This simple observation delivers 
many interesting results that are useful in large power systems. 

A. Contribution 

This paper, based on random matrix theory (RMT), proposes 
a data-driven method to reveal the correlations between factors 
and the power system status. An augmented matrix, formed 
in a certain manner, is presented as the data source. For each 
factor, the augmented matrix combines status data (as the basic 
part), on one hand, and factor data (as the augmented part), 
on the other hand. According to specific researching purposes, 
status data can be voltages, frequencies, currents and power 
flows, while factor data can be loads, distributed generation, 
wind speed, temperature, electricity price, etc. Then, using the 
big data architecture proposed previously 0, we conduct real¬ 
time analysis based on the augmented matrix, and compare 
findings with the RMT theoretical predictions (i.e. Ring Law 
and Marchenko-Pastur Law). During this procedure, the mean 
spectral radius (MSR), a special case of LESs, is used to 
indicate data correlations; the kernel density estimation (KDE) 
is used as an assisted indicator. 

In general, the proposed method extracts the correlations 
in the form of the eigenvalue statistics of measured data. The 
method involves no knowledge of topologies and parameters of 
power systems, and is universal to various factors. Besides, the 
method is robust against random fluctuations in power systems 
and measuring errors in data. Furthermore, the proposed 
method is practical for both real-time analysis and off-line 
analysis, depending on the split-window. 

B. Related Work 

Current researches on correlation analysis are mainly 
model-based methods, for which the mechanism models are 
essential preconditions. These mechanism models are estab¬ 
lished based on assumptions and simplifications, and used for 
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specific power systems and factors. Lian studied the effect 
of dynamic load characteristics on the voltage stability and 
sensitivity in power systems, using the P-V and the Q-V 
curves ll ill . In Lian’s method, the power system is equivalent 
to an decentralized system; dynamic loads are approximated 
as differential equations. These processes increase the com¬ 
plexion and inaccuracy of the analysis. Parinya proposed 
a stochastic stability index to investigate the small signal 
stability of power systems incorporating wind power 11211 . 
The status space equations and energy functions need to be 
rewritten when the grid changes, and the test system is too 
small in scale to convince. 

Also, some data-driven methods for correlation analysis are 
proposed recently, such as the principal components analysis, 
the artificial neural networks, the support vector machine 1131] . 
Eltigani utilizes artificial neural networks (ANNs) in assessing 
the transient stability Q. In his approach, the power system is 
described by an equivalent single machine infinite bus system, 
which cannot reflect accurately the actual state of the system. 
Moreover, with the scale-up of the system and increase of 
training samples, the training speed of ANNs progressively 
slows down. 

II. Random Matrix Theory and Data Processing 

The frequently used notations in this section are given in 
Table. H 


TABLE I: Notations for RMT and data processing 


Notations 

Means 

X, X, X, Xi j 

a matrix, a vector, a single value, an entry of a matrix 

X", 

transpose of a matrix and a vector 

m(x), 0 - 2 (x) 

mean, variance for x 

n 

the data source 

CiVxT 

N X T dimensional complex space 

N, T 

the row size and the column size of the split window 

X 

a raw data matrix 

X 

a standard non-Hermitian matrix 

x„ 

the singular value equivalent of X 

z 

the matrix product 

z 

the standard matrix product 

'^z 

eigenvalues of Z 

|A| 

radius of eigenvalue A on the complex plane 

kmsr 

the mean spectral radius 

S 

the sample covariance matrix 


A. Random Matrix Theory 

The random matrix theory (RMT), developed from several 
different sources in the early 20th century, is one of the 
statistical foundations for big data analytics. It is used as an 
important mathematical tool in various fields, namely, physics, 
finance, wireless communication engineering, etc. 

Massive data can be naturally represented by large random 
matrices lll5n . The random matrix model is the most general: 
rectangular and complex. In our formulation, we view the N 
variables as space samples of a random network (or graph). Eor 
each variable, we make T observations. As a result, a random 
matrix of x T is obtained as our data matrix, which is the 
starting part for our analysis. 


According to RMT, when the dimensions of a random 
matrix are sufficiently large, the empirical spectral distribution 
(ESD) of its eigenvalues converges to some theoretical limits, 
such as Ring Law and Marchenko-Pastur Law (M-P Law) 
illbll . Eor details on the Ring Law and M-P Law, please 
refer to Appendix A. It is noted that although the asymptotic 
convergence in RMT is considered under infinite dimensions, 
the asymptotic results are remarkably accurate for relatively 
moderate matrix sizes such as tens. This is the very reason 
why RMT can be used in practical world. 


B. Real-time Data Processing for RMT 

Currently, there exists no general standardized definition 
for big data. In this paper, we use a mathematical definition 
proposed in our previous work ijst]. In a power system, assume 
that there are n kinds of measurable status variables. At the 
sampling time U, measured data of these variables are formed 
as a column vector x(fi) = (xi, 12 ,..., in) ^ 10. Eor a 
series of time, we can arrange these vectors x in chronological 
order to form a matrix as a data source for further analysis 
(x e 12). 

Within 12, we can obtain a raw data matrix X € 
arbitrarily by using a split-window. Then, we convert it into a 
standard non-Hermitian matrix X with following algorithms. 

= + (1) 

where x^ = p{^i) = 0, a{Sci) = 1 for 

2 = 1, 2,..., N and j = 1, 2,..., T. The matrix X„ G 
introduced as the singular value equivalent of X by 

Xn = v'xx^u (2) 

where U G ^ Haar unitary matrix and X^X^ = 

XX^. 

Eor multiple arbitrarily assigned standard non-Hermitian 
matrices Xi ^=1,2,..., L), the matrix product is obtained 
byz=nt X„ j. Then, Z is converted to the standard matrix 
product Z by 

= 7n^) 

where % = {zi^i,Zi^ 2 , ■ • ■ ,Zi,N) and z* = Zi, 2 , ■ • ■ ,-Si,7v)- 

Eor the standard matrix product Z, the sample covariance 
matrix is obtained by 

S = ^YY^ = ZZ^ (4) 

where Y = v/fVZ, and o'^(yi) = = 1. 

In order to conduct real-time analysis, we use a specific 
split-window to obtain the raw data matrix X from 12, namely, 
the real-time split-window. The real-time split-window trun¬ 
cates measured data at continuous sampling times, where the 
last sampling time is the current time. In other words, at the 
sampling time U, the raw data matrix Xj. is formed by 

X(fi) = (x(fi_T+i), x(fi_r+2), ■ ■ •, x(fi)) (5) 

where x(fj) = {xi, £ 2 , ■ ■ ■, xn)^ is measured data at the 
sampling time tj. 










3 


The data processing procedure above is organized as fol¬ 
lowing steps. The standard matrix product Z is calculated for 
Ring Law; the sample covariance matrix S is calculated for 
M-P Law. For simplicity, we set L = 1 in the matrix product 

Z. 


Steps of Data Processing for Ring Law 

1) At the sampling time ti, obtain the raw data matrix X(ti) 
from the data source ii. 

2) Convert X(ti) into the standard non-Hermitian matrix X(ti). 

3) Calculate the singular value equivalent X„(ti) of X(ti). 

4) Form the matrix product Z{fi) = Xu(ti). 

5) Convert Z{fi) into the standard matrix product Z{fi). 

6) Calculate the sample covariance matrix S(ti) = Z(ti)Z(ti)^ 


C. Linear Eigenvalue Statistics and Kernel Density Estimation 

1) Linear Eigenvalue Statistics: 

The linear eigenvalue statistics (LESs), major focuses in this 
paper, indicate the statistical characteristics of large random 
matrices. The linear eigenvalue statistic of a random matrix 
X is defined as 

n 

where \i {i = 1, 2,..., n) are eigenvalues of X, and ip{-) is a 
test function. 

The mean spectral radius (MSR), a special case of LESs, 
is used to reflect the eigenvalue distribution of the standard 
matrix product Z; it is defined as the mean distribution radius 
of eigenvalues, formulated by 

1 ^ 

kmsr = (”7) 

^ i=l 

where j (* = 1, 2,..., N) are eigenvalues of Z, and jAg J is 
the radius of the eigenvalue Ag ^ on the complex plane. Based 
on MSR, we define the variance of spectral radius (VSR) as 
a further reflection of eigenvalue distribution; it reflects the 
dispersion degree of the eigenvalues of Z, formulated by 

1 ^ 

^ N -I “ «^msr)^ (8) 

^ i=l 

Note that since these complex eigenvalues Ag ^ (* = 

1,2,...,A^) are highly correlated random variables (each is 
a complicated matrix function of random matrices Xi {i = 
1, 2,..., L), kmsr and kvsr are both random variables. 

2) Kernel Density Estimation: 

The kernel density estimation (KDE) depicts the ESD of 
the sample covariance matrix S by following PDE 

/KDE(As) = ^f]K(^" 

i=l 

where As,i (i = l,2,.. .N) are eigenvalues of S, and K(-) is 
the kernel function for the bandwidth parameter h. 


TABLE II: Notations for correlation analysis 


Notations 

Means 

n 

the number of status variables of power systems 

m 

the number of influential factors in power systems 

t 

the study duration 

ts 

the sampling time 

B 

the status matrix 

c 

a factor vector 

D 

the matrix duplicating c for k times 

E 

the noise matrix 

me 

the magnitude of white noise 

c 

the factor matrix 

p 

the signal-to noise ratio 

A 

the augmented matrix 


III. A Correlation Analysis Method for Power 
Systems based on Augmented Matrices 

The frequently used notations are given in TableHJ 


A. Augmented Matrix Method for Power Systems 

Different factors have different effects on power systems 
in their own ways. According to big data analytics, the rela¬ 
tionships between factors and power systems can be revealed 
by data correlations. Based on RMT, it is feasible to extract 
correlations from a data source including both status data and 
factors data, namely, augmented matrix. 

In a power system, assume that there are n types of status 
variables and m types of influential factors, both of which are 
measurable. In a study period ti (i = 1, 2,..., f), measured data 
of each factor are formed as a row vector Cj G (j = 

l,2,...,m) (i.e. factor vector), and measured data of status 
variables for the power system are formed as a matrix B € 
(j^nxt (-j g sfatus matrix). 

In order to balance the proportion of factors data and status 
data in the data source, we form a factor matrix for each factor 
vector. Eirst, for the factor Cj, we duplicate it for k times to 
construct a matrix Dj, formulated by 



(j = 1,2,... ,m) 


kxt 


( 10 ) 


where k has the similar size with n. Then, we introduce white 
noise into Dj to reduce the correlations among its own rows. 
The noise matrix is E = {eij}kxt, where Cij is random 
variable according with the normal distribution. Einally, the 
factor matrix is formulated by 


Cj='Dj+me,jE (j = 1, 2,..., m) (11) 


where mej is the magnitude of white noise for the factor 
matrix Cj. The signal-to-noise ratio (SNR) of the factor matrix 
Cj is defined as 


Tr(D,Df) 
Tr(EE^) X ml- 


{j = 1,2,... ,m) 


( 12 ) 


where Tr(-) is the trace of the matrix. The value of pj, 
requiring careful selections, will affect the performance of 
correlation analysis. In this paper, we set the same SNR for 
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all factors, to ensure the consistency of correlation analysis. 
Therefore, for each T)j{j = 1, 2,.. m), the value of nie.j is 
calculated by 




/ Tr(D,Df) 

Tr(EE'f^) X p 


(J = l,2, 


^) (13) 


For each factor Cj, we can construct an augmented matrix 
for parallel correlation analysis, formulated by 


A,= 


B 

C. 


(j = l,2,...,m) 


(14) 


Steps of Correlation Analysis for Power Systems 

1) In a study period, form the status matrix and factor vectors 

(f = 1, 2,.. ., m). 

2) At each sampling time, 

2a) acquire the standard matrix product Z from B. 

2b) calculate kmsr ^^YSR of 

3) Meanwhile, for each factor Cj, construct the augmented matrix Aj. 

4) At each sampling time, 

4a) acquire the standard matrix product Z from Aj. 

4b) calculate kmsr and ^^VSR of Z. 

4c) acquire the sample covariance matrix S from Z. 

4d) calculate /kde of S and draw the /kde — ^ curve. 

5) Draw kmsr —t curves of the status matrix and the augmented matrix 
for each factor. 


B. Status Data and Factor Data 

1) Status Variables of Power System: 

The operating status of power systems can be estimated 
by various types of status variables, such as frequencies, 
voltages, currents and power flows. One or more types of 
these status variables can form the status matrix B. In this 
paper, we use magnitudes of bus voltages as status data for 
the following considerations: 

a) The voltage magnitude is one of the most basic parameters in 
power systems. It is measurable and available on every bus with 
common measuring devices. Therefore, magnitudes of bus voltages 
have considerable redundancy and accuracy for correlation analysis. 

b) The voltage magnitude does not involve the topology of power 
systems. Therefore, we can conduct analysis without the a prior 
knowledge of network structures and parameters. 

2) Factors in Power Systems: 

The operating status of power systems is mainly affected 
by electrical factors, climatic factors and economic factors. 

a) Electrical factors-nodal loads and distributed generations, etc. 

b) Climatic factors—temperature, wind speed, light intensity, etc. 

c) Economic factors—electricity price, gross domestic product, etc. 

Since data normalization is used during data processing, 
factor data and status data in the augmented matrix can have 
different units and magnitudes. In consideration of different 
sampling frequencies for status data and factor data, it can be 
assumed that data with lower sampling frequency are the same 
values in a sampling interval. 

C. Correlation Analysis Method and Its Advantages 

Based on RMT and the augmented matrix method, a cor¬ 
relation analysis method for power systems is designed as 
following steps. 

Step 2) is conducted for the anomaly detection, where the 
kmsr — t curve of the status matrix discovers the signals 
in power systems i^. Step 3)^), analyzing the correlations 
between the system status and each factor, aim to determine 
the causes to the signals. During the analysis procedure, kmsr 
and /kde are calculated as correlation indicators. 

The proposed correlation analysis method is driven by 
measured data, and based on statistical theories. The procedure 
involves no mechanism models for power systems and factors; 
it eliminates the interference brought by assumptions and 


simplifications. Compared with model-based methods, the pro¬ 
posed method is data-driven, and universal for various factors. 
Meanwhile, the proposed method has strong robustness against 
random fluctuations in systems and measuring errors in data. 
Besides, the method is practical for real-time analysis by using 
a specific split-window. The advantages above will be verified 
in case studies. 

TV. Case Studies 

The proposed method is tested with simulated data in the 
standard IEEE 118-bus system. Detailed information of the 
system is referred to the easel 18.m in Matpower package 
and Matpower 4.1 User’s Manual ifl^ . In the simulations, we 
regard the active load of each bus as a factor; each change of 
a factor is considered as a signal. 

Eour cases are designed in different scenarios to validate 
the effectiveness of the proposed method. In case 1, case 2 
and case 3, three kinds of signals are added into single factor 
to affect the operating status of the test system. In case 4, 
multiple factors with different kinds of signal produce effects 
on the system status. 

In order to determine the causes to the signals, simulated 
data of each factor are used to conduct correlation analysis. 
It is a parallel analysis procedure, and we can only pay our 
attention to potential factors. Here, we just illustrate the results 
with the load of bus 117 and bus 54 to show the performance 
of the proposed method. 

Assume that status data are sampled at each time interval, 
while factor data are sampled every 50 time intervals. Besides, 
white noise is introduced into both status data and factor data 
to represent random fluctuations and measuring errors. 

Eor all cases, let n = 118, f = 1000, N = 118, T = 240, k = 
ilV = 59,p = 500. 

A. Correlation Analysis for Single Factor 

1) Case 1 - Step signal in the load of bus 117: 

In case 1, assumed signals for each factor are shown in 
Tab. Hn] The correlation analysis results are shown in Eig. |6] 
It is noted that the correlation analysis begins at tg = 240, 
because the real-time split-window needs T = 240 times of 
sampling data, including the present sampling and 239 times 
of historical sampling. 
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TABLE III: Assumed Signals for Each Eactor in Case 1 


Bus 

Sampling Time 

Active Load(MW) 


tg = 1 ~ 500 

20.0 


ts = 501 ~ 1000 

120.0 

Others 

tg = 1 ~ 1000 

Unchanged 


In Eig. |6a] based on the kmsr ~ t curve, we can detect 
signals based on analyses below: 

I. During the sampling time ts = 240 ~ 500 , kmsr of status matrix 
remains constant around 0.86, between the outer and inner circle; it 
means the system status is normal without signals. 

II. At ts = 501 , kmsr starts to decline (from 0.8638 to 0 . 7002 ), 
and deviates from the predicted ring (the inner radius is 0 . 7130 ); it 
means there exist signals that change the system status. 

III. kmsr increases back to 0.8662 at ts = 740 and remains constant 
inside the ring afterwards. 

Eirst, we can tell that the signals occur right at tg =501. 
Moreover, in our method, the impact of a signal to MSR will 
delay for a duration of T, due to historical sampling data in the 
real-time split-window. In Eig. |6al the signal area (U-shaped 
curve) is tg = 501 ^ 740, so we can calculate the actual 
duration of signals as 740 — 501 -I- 1 — T = 0. Therefore, we 
can speculate that there are instantaneous signals occurring at 
fs = 501. 

In order to find out the causes to above signals, we conduct 
correlation analysis for each factor. In Eig. |6b] when we 
augment load data of bus 117, kmsr of the signal area 
(tg = 501 ~ 740) decreases dramatically (from 0.7812 to 
0.2998), below the inner circle (0.5123); it indicates strong 
correlations between the load of bus 117 and the system 
status. On the other hand, in Eig. |6c] kmsr remains inside the 
ring throughout the signal area; it indicates poor correlations 
between the load of bus 54 and the system status. 

Besides, we can also determine the correlations by /kde — A 
curves in Eig. |4]and Eig. |5] In Eig. @1 at fg = 620 (inside the 
signal area), when we augment load data of bus 117, /kde 
deviates from /mp; it indicates strong correlations between the 
load of bus 117 and the system status. However, in Eig. |5l 
/kde with the load of bus 54 accords with /mp at tg = 620; 
it indicates poor correlations between the load of bus 54 and 
the system status. 

As a result, we deduce that the load of bus 117, but not 
bus 54, is the cause for instantaneous signals at fg = 501. This 
analysis result accords with assumed signals in Tab.lTIIl In fact, 
we only add signals to the active load of bus 117. Specifically, 
the active load of bus 117 increases from 20 MW to 120 MW 
right at fg = 501. 

2) Case 2 - Peak and dip signals in the load of bus 117: 

In case 2, assumed signals for each factor are shown in 
Tab. |IV] The correlation analysis results are shown in Eig. |7] 

TABLE IV: Assumed Signals for Each Eactor in Case 2 


Bus 

Sampling Time 

Active Load(MW) 


tg = 1 300 

60.0 


tg = 301 ~ 350 

120.0 

117 

tg = 351 ~ 650 

60.0 


tg = 651 ~ 700 

20.0 


tg = 701 ~ 1000 

60.0 

Others 

tg = 1 ~ 1000 

Unchanged 



(a) ts = 500 


(b) ts = 620 


Eig. 1: Eigenvalue distributions of standard matrix products in 
case 1: the data source is the status matrix. 



Eig. 2: Eigenvalue distributions of standard matrix products 
in case 1: the data source is the augmented matrix, including 
load data of bus 117. 



Elgenvalje-Inner Radius MSR|| 

Sampling Time=500 



digervaiue-Inner Radius MS^ 

Sampling Time=620 


(a) ts = 500 


(b) ts = 620 


Eig. 3: Eigenvalue distributions of standard matrix products 
in case 1: the data source is the augmented matrix, including 
load data of bus 54. 


In Eig. 1211 based on the kmsr ~ t curve, we can detect 
signals below; 

I. Erom fg = 301 to fg = 590, the kmsr —f curve is U-shaped 
beyond the predict ring. It indicates that the signals occur at 
fs = 301, and the signal area is fg = 301 ^590. 

II. Erom tg = 650 to tg = 940, the kmsr ~ 1 curve is U- 
shaped beyond the predict ring. It indicates that the signals 
occur at tg =650, and the signal area is fs = 650 ~940. 

In consideration of the delayed effect of a signal to MSR, 
we can calculate actual durations of above signals as 590 — 
301-bl-T = 50 and 940-651-f 1-T= 50. Therefore, we 
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Outer Radius—- InneVPadius MSR- VS'r] j-Oua'r Radius-Inner Radius MSR-VSR j p-Ouier Radius-Inner Radius MSR-VSR| 



Sampling Time Sampling Time Sampling Time 

(a) Data source: status matrix (b) Data source: augmented matrix including load(c) Data source: augmented matrix including load 

data of bus 117 data of bus 54 


Fig. 6: kmsr — t curves of standard matrix products in Case 1 


Sampling Time=500 



Sampling Time=620 



Fig. 4: kkde — A curves of standard matrix products in case 1: 
the data source is the augmented matrix, including load data 
of bus 117 


Sampling Time=500 



Sampling Time=620 



Fig. 5: kkde — A curves of standard matrix products in case 1: 
the data source is the augmented matrix, including load data 
of bus 54 


can speculate that there are continuous signals occurring at 


i^ = 301~351 and fs = 651~701. 

In Fig. Eb] when we augment load data of bus 117, kmsr 
inside two signal areas declines remarkably and deviates from 
the predicted ring (from 0.7858 to 0.3538 and from 0.7830 to 
0.3846); it indicates strong correlations between the load of 
bus 117 and the system status. On the other hand, in Fig. iTcl 
when we augment load data of bus 54, kmsr inside both signal 
areas remains in the predicted ring throughout the signal area; 
it indicates poor correlations between the load of bus 54 and 
the system status. As a result, we deduce that the load of bus 
117, but not bus 54, is the cause for continuous signals during 
t^ = 301-351 and ts = 651-701. 

Above analyses accord with assumed signals in Tab. |IV] In 
this case, we only add signals to the active load of bus 117. 
To be specific, the active load of bus 117 has a peak during 
fs = 301~351, and a dip during fs = 651~701. 

3) Case 3 - gradual signals in the load of bus 54: 

In case 3, assumed signals for each factor are shown in 
Tab.lv] The correlation analysis results are shown in Fig. 0 

TABLE V: Assumed Signals for Each Eactor in Case 3 


Bus 

Sampling Time 

Active Load(MW) 


ig = 1 ~ 300 

113.0 


ts = 301 ~ 350 

135.6 


ts = 351 ~ 400 

158.2 


ts = 401 ~ 450 

180.8 

54 

ts = 451 ~ 500 

203.4 

ts = 501 ~ 550 

226.0 


ts = 551 ~ 600 

248.6 


ts = 601 ~ 650 

271.2 


ts = 651 ~ 700 

293.8 


ts = 701 ~ 1000 

316.4 


Others ts = 1 1000 Unchanged 


In Eig. the U-shaped curve is from fg = 301 to tg = 940. 
It indicates signals occurring at tg = 301. In consideration of 
the delayed effect of a signal to MSR, we can calculate actual 
durations of above signals as 940—301+1—r=400. Therefore, 
we can speculate that there are continuous signals occurring 
at fg=301-701. 

In Eig. [8c| when we augment load data of bus 54, kmsr 
inside the signal area declines remarkably and deviates from 
the predicted ring (from 0.7803 to 0.4227); it indicates strong 
correlations between the load of bus 54 and the system status. 
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On the other hand, in Fig. 180 when we augment load data of 
bus 117, kmsr remains in the predicted ring throughout the 
signal area; it indicates poor correlations between the load of 
bus 117 and the system status. As a result, we deduce that the 
load of bus 54, but not bus 117, is the cause for continuous 
signals during fs = 301^701. 

Above analyses is in accordance with assumed signals in 
Tab. |V] In this case, we only add signals to the active load of 
bus 54. In detail, the active load of bus 54 increase gradually 
during fg = 301 ~ 701. 

B. Correlation Analysis for Multiple Factors 

In case 4, assumed signals for each factor are shown in 
Tab. [yi] The correlation analysis results are shown in Fig. |9] 

TABLE VI: Assumed Signals for Each Eactor in Case 4 


Bus 

Sampling Time 

Active Load(MW) 


ts = 1 ~ 300 

60.0 


tg = 301 ~ 350 

120.0 

117 

tg = 351 - 650 

60.0 


tg = 651 - 700 

20.0 


tg = 701 - 1000 

60.0 


ts = 1 - 300 

113.0 


tg = 301 - 350 

135.6 


tg = 351 - 400 

158.2 


tg = 401 - 450 

180.8 

54 

tg = 451 ~ 500 

203.4 

tg = 501 ~ 550 

226.0 


tg = 551 ~ 600 

248.6 


tg = 601 ~ 650 

271.2 


tg = 651 - 700 

293.8 


tg = 701 ~ 1000 

316.4 

Others 

ts = 1 ~ 1000 

Unchanged 


Based on the kmsr —f curve in Eig.|9al two kinds of signals 
are detected: 

I. Two U-shaped curves are found during tg = 301 ~ 590 
and ts = 651 ~ 940. Referring to the analysis in case 2, it 
indicates two continous signals during tg = 301 ^ 351 and 
4 = 651-701. 

I. The Third curve are found during 4 = 301 — 940. 
Referring to the analysis in case 3, it indicates continuous 
signals during 4 = 301 — 700. 

During the first two signal areas (tg = 301 — 590 and tg = 
651 — 940), when we augment the load of bus 117, kmsr 
deviates from Ring Law, shown in Eig. During the third 
signal area (4 = 301 — 940), when we augment the load of 
bus 54, the deviation of /vmsr is shown in Eig. As a result, 
we can achieve following speculations: 

I. The load of bus 117 affects the system status during 4 = 
301-351 and 4=651-701. 

II. The load of bus 54 affects the system status during 4 = 
301-700. 

These speculations are validated by Table. IVTl 

C. More Discussions in Details 

Through above four cases, the effectiveness and perfor¬ 
mance of the correlation analysis method is verified under 
different scenarios. However, there are still some interesting 
details left. 


Eirst, we can observe that kmsr ~ t curves in case 4 are 
approximately superposed by corresponding curves in case 2 
and case 3. In fact, the assumed signals in case 4 are combined 
with those in case 2 and case 3. In conclusion, there should 
be some relationships between phenomena in power systems, 
and LES in mathematics. 

Secondly, when data become more correlated, the VSR 
increases as well. This phenomenon can be explained based 
on the eigenvalue distributions of standard matrix products. 
In Eig. [Ta] Eig. |2a] and Eig. at tg = 500, the eigenvalues 
distribute between the outer circle and the inner circle, con¬ 
forming to the Ring Law; it indicates poor correlations in data. 
During the signal area (such as 4 = 620), when we augment 
data of correlated factors, all the eigenvalues gather towards 
the circle center, shown in Eig. |2b] When we augment data of 
irrelevant factors, only some of the eigenvalues gather towards 
the circle center, shown in Eig. 

Thirdly, to represent random fluctuations in the system and 
measuring errors in sampling data, white noise is introduced 
into both status data and factor data throughout the simulation. 
Observations in case studies indicate that kmsr does not 
change dramatically in a system dominated by white noise. 
Therefore, kmsr is a reliable indicator to identify signals from 
random fluctuations and measuring errors. 

V. Conclusion 

This paper proposes a data-driven method to reveal the 
correlations between factors and the power system status. Eirst, 
for each factor, we construct an augmented matrix as a data 
source, combining status data and factors data reasonably. 
Secondly, to conduct real-time analysis, we use a specific split- 
window to obtain the raw data matrix at each sampling time. 
Then, we conduct correlation analysis for the raw data matrix 
based on random matrix theory (RMT). The mean spectral 
radius (MSR), a kind of linear eigenvalue statistics (LESs), 
is calculated as a correlation indicator; the kernel density 
estimation (KDE) is used as a assisted tool to reveal data 
correlations. The proposed method is direct, robust against 
bad data and universal to varieties of factors. Case studies 
demonstrate the effectiveness of the method for both single 
factor and multiple factors. 

The current work is only a preliminary exploration of 
correlation analysis based on RMT. Much more researches are 
needed along this direction. The degree of correlations needs 
to be further quantified. Eor example, in a study period, we 
can use the integration of MSR to quantify the correlations. 
Besides, we can use data of subarea to construct the basic 
matrix; in this way we can fix the data missing problems. 
Eurthermore, the proposed method can be used to reveal 
the correlation between any two types of variables, as long 
as combining their data reasonably in the data source. Our 
method can be applied for specific issues in power systems, 
such as voltage stability, weak buses identification, abnormal 
and fault diagnosis. Combinations with model-based methods 
and data processing methods will improve the performance, 
and uncover more connections between electrical phenomena 
and statistical characteristics. 
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data of bus 117 data of bus 54 

Fig. 7: kmsr ~ t curves of standard matrix products in Case 2 



(a) Data source: status matrix (b) Data source: augmented matrix including load(c) Data source: augmented matrix including load 

data of bus 117 data of bus 54 


Fig. 8: kmsr — t curves of standard matrix products in Case 3 



Outsr Radius-Inner Radius MSR-VSR [ 



(a) Data source: status matrix (b) Data source: augmented matrix including load(c) Data source: augmented matrix including load 

data of bus 117 data of bus 54 


Fig. 9: kmsr — t curves of standard matrix products in Case 4 


Appendix A 

Random Matrix Theory 


A. Ring Law 


matrix product as 


L 

z = l[Xu^ 

2=1 


(16) 


Let Xs be a standard non-F[ermitian random matrix, 

whose entries are independent identically distributed (i.i.d.) 
variables with 

p(X,)=0,a2(X,) = l ii = l,2,...,N) (15) 

where x,; = {xi^i,Xi^ 2 ,- ■ For multiple standard non- 

Hermitian random matrices (i = 1, 2,..., L), we define the 


where X„_i is the singular value equivalent of X^. The matrix 
product Z can be converted to the standard matrix product 
Z, whose cr^(zi) = ^ in each row. According to Ring Law 
Ell] , the BSD of Z converges almost surely to the limit 
with a probability density function (PDF) given by 


/rl(Az) 


(l-c)^<|Ag|<l 
0 otherwise 


(17) 
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as iV,T —>■ oo with a constant ratio c = ^ G (0,1]. On the [10] 
complex plane of eigenvalues, the inner circle radius is (1—c )'2 
and the outer circle radius is unity. 


B. Marchenko-Pastur Law 


[ 11 ] 


Let X={a:i jItvxt be a random matrix, whose entries are 
i.i.d. variables with 


p(xi) = 0,cr^(xi) = d = (18) 


[ 12 ] 


where d < cxd is a constant, and x^ = ■ - jXi ^ T )- The 

sample covariance matrix of X is defined as 

S = 4XX" (19) 

^ [13] 

According to M-P Law imi, the BSD of S converges 
to the following PDF 

/mp(As) = | 2¥sW\/(^-^s)(As-a) a^Xs<b 
I 0 otherwise 

as N,T —>• oo with a constant ratio c = ^ € (0,1], where 
a = d(l — y/c)^, b = d(l + 
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